Policy Search in Continuous Action Domains: an Overview
نویسندگان
چکیده
Continuous action policy search, the search for efficient policies in continuous control tasks, is currently the focus of intensive research driven both by the recent success of deep reinforcement learning algorithms and by the emergence of competitors based on evolutionary algorithms. In this paper, we present a broad survey of policy search methods, incorporating into a common big picture these very different approaches as well as alternatives such as Bayesian Optimization and directed exploration methods. The main message of this overview is in the relationship between the families of methods, but we also outline some factors underlying sample efficiency properties of the various approaches. Besides, to keep this survey as short and didactic as possible, we do not go into the details of mathematical derivations of the elementary algorithms.
منابع مشابه
Model - based Direct Policy Search ( Extended Abstract ) Jan
Scaling Reinforcement Learning (RL) to real-world problems with continuous state and action spaces remains a challenge. This is partly due to the reason that the optimal value function can become quite complex in continuous domains. In this paper, we propose to avoid learning the optimal value function at all but to use direct policy search methods in combination with model-based RL instead.
متن کاملBig Tobacco, Alcohol, and Food and NCDs in LMICs: An Inconvenient Truth and Call to Action; Comment on “Addressing NCDs: Challenges From Industry Market Promotion and Interferences”
In their editorial, Tangcharoensathien et al1 describe the challenges of industry market promotion and policy interference from Big Tobacco, Alcohol, and Food in addressing non-communicable diseases (NCDs). They provide an overview of the increasing influence of corporate interest in emerging eco...
متن کاملGuided exploration in gradient based policy search with Gaussian processes
Applying reinforcement learning(RL) algorithms in robotic control proves to be challenging even in simple settings with a small number of states and actions. Value function based RL algorithms require the discretization of the state and action space, a limitation that is not acceptable in robotic control. The necessity to be able to deal with continuous state-action spaces led to the use of dif...
متن کاملPlanning with Continuous Resources in Stochastic Domains
Past work has dealt with various variants of this problem. We consider the problem of optimal planning in stochastic domains with metric resource constraints. Our goal is to generate a policy whose expected sum of rewards is maximized for a given initial state. We consider a general formulation motivated by our application domain planetary exploration in which the choice of an action at each st...
متن کاملMean Actor Critic
We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent’s explicit representation of all action values to estimate the gradient of the policy, rather than using only the actions that were actually executed. This significantly reduces variance in the gradient updates and removes the n...
متن کامل